Sentiment Hackpad

Authors:

Daniela Huppenkothen, Phil Marshall, Madhura Killedar

We did some natural language processing by performing sentiment analysis on the 2016 AstroHackWeek Hackpad



In [22]:

    
!pip install textblob









    



Requirement already satisfied (use --upgrade to upgrade): textblob in /Users/discworld/miniconda3/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): nltk>=3.1 in /Users/discworld/miniconda3/lib/python3.5/site-packages (from textblob)



In [23]:

    
from __future__ import unicode_literals, print_function
import textblob
import pandas as pd
import numpy as np

Test data

As a quick test, we feed some text into the textblob sentiment analyzer.

polarity can range from -1 to 1.

-1 reflects extreme negative associations
1 reflects extreme positive associations
0 is neutral language



In [24]:

    
textblob.TextBlob("Hello World I hate you").sentiment.polarity









    Out[24]:





-0.8



In [25]:

    
textblob.TextBlob("Hello World I love you").sentiment.polarity









    Out[25]:





0.5

Hackpad data

Read data

To do: Find a more automatic text-scraping method



In [26]:

    
#textfile = "../hackpadtext_Wed.txt"
textfile = "../hackpadtext_Thu.txt"
#textfile = "../hackpadtext_Thu_active.txt"



In [27]:

    
rawdata = pd.read_csv(textfile, header=None, names=["text"], sep="\n", encoding="utf-8")

Analyse

Analyse and store polarity of each chunk



In [28]:

    
rawdata["polarity"] = np.zeros_like(np.array(rawdata.columns["0"]))



In [29]:

    
# analyse each data/hack idea
feelings = []
for i in rawdata.index:
    data = rawdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    rawdata.loc[i,"polarity"] = polarity
    feelings.append(polarity)

How happy are we on average?



In [30]:

    
average_feels = sum(feelings)/len(feelings)
print(average_feels)









    



0.180728006749



In [31]:

    
if average_feels>0:
    print("Yay, we're happy! wooooooooooo!")
else:
    print("oh no not happy jan")









    



Yay, we're happy! wooooooooooo!

Who sounds sad?



In [32]:

    
# search for sad hacks
rawdata[rawdata["polarity"]<0]









    Out[32]:






  
    
      
      text
      polarity
    
  
  
    
      0
      Active Projects:
      -0.133333
    
    
      1
      Move your project up here if it is being activ...
      -0.166667
    
    
      12
      AstroHackWeek image Gallery - (Arna) Image gal...
      -0.0375
    
    
      21
      Deprojecting Galaxies (or molecular structure)...
      -0.0111111
    
    
      26
      Here's my ongoing failure in notebook form
      -0.316667
    
    
      35
      Classifying the pulse shapes of pulsars using ...
      -0.0218182
    
    
      36
      A custom Monte Carlo sampler for the Kepler pr...
      -0.225952
    
    
      40
      Making MCMC fail on problems with implicit, fl...
      -0.00625
    
    
      50
      Classifying the pulse shapes of pulsars using ...
      -0.0218182
    
    
      60
      Modelling 2-D Impulse Response Function for Ac...
      -0.129167
    
    
      73
      Managing Large Scale Structure Data with Datab...
      -0.0111772
    
    
      84
      Neural Networks (Zaki Ali) - I'm working on a ...
      -0.148864
    
    
      92
      Start with a single species (say FeII), conver...
      -0.0107143
    
    
      102
      Bayesian networks for inference of young star ...
      -0.11
    
    
      105
      Python API to perform SDSS SQL Queries: Sky Se...
      -0.09375



In [33]:

    
# search for happy hacks
#rawdata[rawdata["polarity"]>0]



In [34]:

    
# Top Five Happy Hacks!
rawdata.sort_values("polarity")[::-1][:5]









    Out[34]:






  
    
      
      text
      polarity
    
  
  
    
      7
      Tips and Tricks for Teaching with Jupyter Note...
      1
    
    
      100
      Lunch sounds good!
      0.875
    
    
      111
      happy to chat about uncertainty and implementi...
      0.8
    
    
      98
      A good point of reference: streams. Hope to jo...
      0.7
    
    
      62
      Sure, sounds good!
      0.6875

Wait... most of those sound like comments, not hacks!

Hackpad data (filtering out short comments)

Now, we'll assume and hope that a chunk of text with more than 20 words is an actual hack project idea as opposed to a comment. This isn't always true, so there's room for improvement.



In [35]:

    
rawdata["mask"] = np.zeros_like(np.array(rawdata.columns["0"]))



In [36]:

    
# select only 
for i in rawdata.index:
    if len(rawdata.loc[i,"text"].split(" "))>20:
        rawdata.loc[i,"mask"] = True
    else:
        rawdata.loc[i,"mask"] = False

New dataset only includes hacks, not comments



In [37]:

    
hackdata = rawdata[rawdata["mask"]]



In [38]:

    
#Top Five Sad Actually-Hacks (probably)
hackdata.sort_values("polarity")[:5]









    Out[38]:






  
    
      
      text
      polarity
      mask
    
  
  
    
      36
      A custom Monte Carlo sampler for the Kepler pr...
      -0.225952
      True
    
    
      84
      Neural Networks (Zaki Ali) - I'm working on a ...
      -0.148864
      True
    
    
      60
      Modelling 2-D Impulse Response Function for Ac...
      -0.129167
      True
    
    
      102
      Bayesian networks for inference of young star ...
      -0.11
      True
    
    
      105
      Python API to perform SDSS SQL Queries: Sky Se...
      -0.09375
      True



In [39]:

    
# Top Five Happy Actually-Hacks (probably)
hackdata.sort_values("polarity")[::-1][:5]









    Out[39]:






  
    
      
      text
      polarity
      mask
    
  
  
    
      7
      Tips and Tricks for Teaching with Jupyter Note...
      1
      True
    
    
      6
      Gaussian Process Tutorial (Jake/Phil) We start...
      0.625
      True
    
    
      95
      Long-shot: if we finish the automatic velocity...
      0.5
      True
    
    
      39
      Create color palettes for custom queries (Adri...
      0.5
      True
    
    
      30
      collaboratr (Mike Baumer,  Usman Khan, Casey L...
      0.5
      True

Repeat analysis from earlier



In [40]:

    
moarfeelings = []
for i in hackdata.index:
    data = hackdata.loc[i].values[0]
    polarity = textblob.TextBlob(data).sentiment.polarity
    moarfeelings.append(polarity)



In [41]:

    
average_feels = sum(moarfeelings)/len(moarfeelings)
print(average_feels)









    



0.168875721776



In [42]:

    
if average_feels>0:
    print("YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!")
else:
    print("oh no we're actually sad")









    



YAY, WE'RE ACTUALLY HAPPY! wooooooooooo!



In [ ]:

	text	polarity
0	Active Projects:	-0.133333
1	Move your project up here if it is being activ...	-0.166667
12	AstroHackWeek image Gallery - (Arna) Image gal...	-0.0375
21	Deprojecting Galaxies (or molecular structure)...	-0.0111111
26	Here's my ongoing failure in notebook form	-0.316667
35	Classifying the pulse shapes of pulsars using ...	-0.0218182
36	A custom Monte Carlo sampler for the Kepler pr...	-0.225952
40	Making MCMC fail on problems with implicit, fl...	-0.00625
50	Classifying the pulse shapes of pulsars using ...	-0.0218182
60	Modelling 2-D Impulse Response Function for Ac...	-0.129167
73	Managing Large Scale Structure Data with Datab...	-0.0111772
84	Neural Networks (Zaki Ali) - I'm working on a ...	-0.148864
92	Start with a single species (say FeII), conver...	-0.0107143
102	Bayesian networks for inference of young star ...	-0.11
105	Python API to perform SDSS SQL Queries: Sky Se...	-0.09375

	text	polarity
7	Tips and Tricks for Teaching with Jupyter Note...	1
100	Lunch sounds good!	0.875
111	happy to chat about uncertainty and implementi...	0.8
98	A good point of reference: streams. Hope to jo...	0.7
62	Sure, sounds good!	0.6875

	text	polarity	mask
36	A custom Monte Carlo sampler for the Kepler pr...	-0.225952	True
84	Neural Networks (Zaki Ali) - I'm working on a ...	-0.148864	True
60	Modelling 2-D Impulse Response Function for Ac...	-0.129167	True
102	Bayesian networks for inference of young star ...	-0.11	True
105	Python API to perform SDSS SQL Queries: Sky Se...	-0.09375	True

	text	polarity	mask
7	Tips and Tricks for Teaching with Jupyter Note...	1	True
6	Gaussian Process Tutorial (Jake/Phil) We start...	0.625	True
95	Long-shot: if we finish the automatic velocity...	0.5	True
39	Create color palettes for custom queries (Adri...	0.5	True
30	collaboratr (Mike Baumer, Usman Khan, Casey L...	0.5	True